
import Head from 'next/head'

<Head>
  <script>
    {
      `(function() {
         var _hmt = _hmt || [];
(function() {
  var hm = document.createElement("script");
  hm.src = "https://hm.baidu.com/hm.js?e60fb290e204e04c5cb6f79b0ac1e697";
  var s = document.getElementsByTagName("script")[0]; 
  s.parentNode.insertBefore(hm, s);
})();
       })();`
    }
  </script>
</Head>

![LangChain](https://pica.zhimg.com/50/v2-56e8bbb52aa271012541c1fe1ceb11a2_r.gif)





这涵盖了如何从URL列表中加载HTML文档，以便我们可以在下游使用。

```python
 from langchain.document_loaders import UnstructuredURLLoader

```

```python
urls = [
    "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-8-2023",
    "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-9-2023"
]

```

```python
loader = UnstructuredURLLoader(urls=urls)

```

```python
data = loader.load()

```

Selenium URL加载器[#](#selenium-url-loader "永久链接")
===============================================

这涵盖了如何使用`SeleniumURLLoader`从URL列表中加载HTML文档。

使用selenium可以加载需要JavaScript渲染的页面。

设置[#](#setup "永久链接")
--------------------

要使用`SeleniumURLLoader`，您需要安装`selenium`和`unstructured`。

```python
from langchain.document_loaders import SeleniumURLLoader

```

```python
urls = [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "https://goo.gl/maps/NDSHwePEyaHMFGwh8"
]

```

```python
loader = SeleniumURLLoader(urls=urls)

```

```python
data = loader.load()

```

Playwright URL加载器[#](#playwright-url-loader "永久链接")
===================================================

这涵盖了如何使用`PlaywrightURLLoader`从URL列表中加载HTML文档。

与Selenium的情况类似，Playwright允许我们加载需要JavaScript渲染的页面。

设置[#](#id1 "永久链接")
------------------

要使用`PlaywrightURLLoader`，您需要安装`playwright`和`unstructured`。

此外，您需要安装Playwright Chromium浏览器：

```python
# Install playwright
!pip install "playwright"
!pip install "unstructured"
!playwright install

```

```python
from langchain.document_loaders import PlaywrightURLLoader

```

```python
urls = [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "https://goo.gl/maps/NDSHwePEyaHMFGwh8"
]

```

```python
loader = PlaywrightURLLoader(urls=urls, remove_selectors=["header", "footer"])

```

```python
data = loader.load()

```

